Natural language understanding in controlled virtual environments

نویسنده

  • Patrick Ye
چکیده

ion Yes Yes Yes Yes Yes Indexing method Word Word Word and Sense Word and Sense Word and Sense 115424 synsets 8869 senses 3400 senses Coverage 41100 entries ? 355224 head words 7463 head words 3600 head words Linguistic Corpus Psycholinguistic Corpus Psycholinguistic Sense origin intuition Corpus examples knowledge examples knowledge Compatibility Poor Poor Good Poor Poor Table 3.5: Differences between Word Sense Resources CHAPTER 3. TECHNOLOGY FOUNDATIONS 80 "Paula hit the ball" Example 6: VerbNet Frame Example CHAPTER 3. TECHNOLOGY FOUNDATIONS 81 3.9.2 Overview of Commonly Used WSD Techniques In the broadest sense, there are two types of WSD systems: supervised and unsupervised. A prototypical supervised WSD system must rely on sense-labelled training examples as its primary source of disambiguation information, as a result, the quality and quantity of the training data have always been the bottleneck of the supervised WSD systems. In contrast, a prototypical unsupervised WSD system do not rely on sense-labelled training data and instead rely on machine readable dictionaries, thesauruses and unlabelled corpora as its main sources of disambiguation information.11 From the coverage’s point of view, unsupervised WSD systems tend to be able to handle much greater number of polysemous words than supervised systems because of the lack of sense-labelled data. However, from the performance’s point of view, supervised systems tend to perform much better than unsupervised systems on words for which training data is available. For the purpose of this research, the choice of WSD system will depend on the nature of the movie scripts, and this issue will be discussed in detail in chapter ?? where the movie script corpus is described. General Context Based Word Sense Disambiguation The most common source of disambiguation features for this particular type of WSD system is the surrounding words of the ambiguous word, (Yarowsky 1995), (Yarowsky 1993), (Gale et al. 1992). Using just the surrounding words, it is possible to create the following disambiguation features: 1. n-grams of the lemmatized surrounding words within a window of K work tokens to the left and right of the target polysemous word 2. n-grams of the Part of Speech (POS) tags of the surrounding words within a window of K work tokens to the left and right of the target polysemous word To illustrate the n-grams related features, consider the ambiguous word bank in the sentence “I went to the bank to apply for a home loan. ”. If the window size is chosen to be 5 words, i.e. 5 words to the left of bank and 5 words to the right of bank, then the following n-gram features could be extracted: uni-gram (“SENTENCE START”), (“I”), (“go”), (“to”), (“the”), (“to”), (“apply”), (“for”), (“a”), (“home”), (“loan”) bi-gram (“SENTENCE START”, “I”), (“I”, “go”), (“go”, “to”), (“to”, “the”), (“to”, “apply”), (“apply”, “for”), (“for”, “a”), (“a”, “home”), (“home”, “loan”) However, it is also possible for an unsupervised WSD system to use labelled training examples. CHAPTER 3. TECHNOLOGY FOUNDATIONS 82 It is also possible to include the relative positions of the n-grams as part of the feature. For example, a uni-gram feature of (“apply”) could become (“apply”, +) to indicate it is to the right of the target word, or even (“apply”, 2) to indicate it is the second word token to the right of the target word. With other commonly available natural language processing tools, it would not be difficult to include higher-level features such as: 1. WordNet taxonomy information for the surrounding words 2. Chunking information 3. Parse tree information 4. Named Entity (NE) information 5. Anaphora information Most general-purpose WSD systems would employ a particular combination of empirical models, and train classifiers with a set of features similar to the ones mentioned above. Examples of such WSD systems include Yarowsky (1995), Ng and Lee (1996) and Cabezas et al. (2001). On top of the features, two additional heuristics are often used in general purpose WSD systems to provide additional disambiguation information when the genre or topic of the sentence containing the ambiguous word is available. The first is the one sense per collocation (OSPC) heuristic (Yarowsky 1993), it states that if a word tokens c is found to occur within the context of a particular sense si of a polysemous word w, then it is highly likely that whenever c co-occurs with w, si will be w’s correct sense. For example, consider the polysemous word plant with the following two senses: plant1: buildings for carrying on industrial labour plant2: a living organism lacking the power of locomotion The first sense of plant is likely to co-occur with words such as “factory”, “worker”, “engineer” and “machinery”; whereas the second sense would likely to co-occur with words such as “flower”, “root”, and “leaf”. As a result, these words tend to have sufficient disambiguation power to determine the correct sense within given contexts. The second heuristic is one sense per discourse (OSPD) (Gale et al. 1992), it states that polysemous words tend to exhibit only one sense in a given discourse. For example, consider the noun plant in the excerpt taken from a news article12 shown in Figure 7. The word “engineer” is a good The title of this article is “A Worker Recalls the Chernobyl Disaster”, it was written by ANNA MELNICHUK and published by The Associated Press, on Tuesday, April 25, 2006; 2:13 PM CHAPTER 3. TECHNOLOGY FOUNDATIONS 83 indication that for the first occurrence of plant, the first sense is being used, then with the OSPD heuristic, it can be correctly determined that the first sense also applies to the second occurrence of plant. One seminal WSD paper that exemplifies the use of n-gram based features, the OSPC heuristic and the OSPD heuristic is Yarowsky (1995). This paper introduces a semi-unsupervised system which learns to disambiguate polysemous words from a small number of labelled training data and a large number of unlabelled training data by applying the above two heuristics. Yarowsky’s system starts with a few gold-standard-labelled examples as the seeds of the bootstrapping process. It then trains a classifier based on these seeds, and use it to classify the senses of the unlabelled data. Once this classification is finished, the one-sense-per-discourse heuristic and a probability threshold are then used to select a new set of highly-ranked examples from the classified data. These new examples will then be used to train the next classifier. This cycle of training-analyzing-retraining keeps repeating until either all the data have been labelled or the reanalyzing stage produces no more new highly-ranked examples. Yarowsky’s system is elegant because it does not require large numbers of gold-standard labelled training data but can still achieve results comparable to those that do. The features used in this system are just word collocations of the polysemous words. The evaluation data for Yarowsky’s system consists of 11 polysemous nouns and 1 polysemous verb extracted from a 460 million word corpus,13 with two senses per word. The disambiguation accuracy for all these words ranged between 93.6% to 98.8%, with an average accuracy of 96.5%. However, as all of the test words only have 2 senses, it can be argued that Yarowsky’s disambiguation task may be too easy. Mihalcea and Csomai (2005) describes another general context based WSD system called SenseLearner. Unlike Yarowsky’s system, SenseLearner was semi-supervised system that could handle a much broader number of polysemous words. Even though SenseLearner was a semi-supervised system designed to use as little annotated data as possible, it still requires a certain amount of gold standard training data which comes from Semcor (Miller et al. 1993). SenseLearner makes use of a set of predefined word categories which are groups of words sharing the some common syntactic and semantic properties. Each verb category can handle the disambiguation of a specific group of verbs. With the categories, SenseLearner trains a contextual semantic model and a collocation semantic model which can be used independently for the disambiguation of all the Semcor verbs which are covered by the verb categories. The contextual model treats the lemmas and POS tags of the words immediately surrounding target verb as a bag of words. The collocation model uses the same surrounding words, but treat them as bi-grams, for example, given the target verb say in “the judge said The Yarowsky paper does not mention the name of the corpus. CHAPTER 3. TECHNOLOGY FOUNDATIONS 84 yesterday ...”, the collocations will be judge said and said yesterday. For verbs which are not in Semcor or covered by the word categories, the majority sense as defined by WordNet is used. SenseLearner was evaluated on the English all-words disambiguation tasks of Senseval-2 (Palmer et al. 2001) and it beat the majority class baselines by a comfortable margin. The broad coverage of SenseLearner makes it particularly attractive to this research, and it will be evaluated against the movie script corpus in chapter 6. However, additional analysis performed on SenseLearner’s outputs for the English all-words task of Senseval-2 revealed that its VSD accuracies was 49.5%, which was significantly lower than its overall accuracy of 69.0%. Similar analysis performed on the other systems that took part in the same task further revealed that the majority of these systems also performed significantly poorer on verbs than they did on words of other parts-of-speech; and SenseLearner’s performance on verbs was in fact the highest among all the systems. Since all these WSD systems were either unsupervised or semi-unsupervised, their poor performance on verbs seem to indicate that unsupervised and semi-unsupervised may be inadequate for the disambiguation of verbs. The unsupervised and semi-unsupervised WSD systems of Senseval-2 were not the only ones that performed poorly on verbs. The supervised WSD systems that participated in the English lexical-sample task (Kilgarriff 2001) also had poor performance on verbs. Table 3.6 lists the performances of the systems that participated in this task, and it can be observed that all of them performed poorer on verbs than on words of other parts-of-speech, and only one of them had a VSD accuracy better than the majority class baseline. Descriptions of these systems show that they differ mainly on the machine learning systems they used, and the vast majority of their features were the general context WSD features described in Section 3.9.2. The WSD systems reviewed so far all used general context WSD features and they all performed poorly on verbs. I postulate that the low performances on verb sense disambiguation is due to the fact that none of the systems treated verbs differently from nouns. I believe that in the context of WSD, the most significant difference between verbs and nouns is that verbs are predicating words, whereas nouns are arguments to predicating words. In this respect, a parallel can be drawn between natural languages and programming languages: verbs are like functions. In programming languages, the number of values that arguments can take are always far greater than the number of available functions, and the situation is the same in natural languages. Therefore, it is more likely for a verb to occur in the surrounding context of a noun, than for the same noun to occur in the surrounding context of the verb. The implication of this fact is that the surrounding words of a noun will in general have greater disambiguation power than the surrounding words of a verb. This also implies that CHAPTER 3. TECHNOLOGY FOUNDATIONS 85 Overall WSD Performance Verb only WSD Performance System Precision Recall F-Score Precision Recall F-Score UNED LS-U (Fernandez-Amoros et al. 2001) 0.4 0.4 0.4 0.145 0.145 0.145 UNED LS-T 0.5 0.5 0.5 0.170 0.169 0.17 WASPS-Workbench (Tugwell and Kilgarriff 2001) 0.58 0.32 0.41 0.0 0.0 0.0 UMD SST (Cabezas et al. 2001) 0.57 0.57 0.57 0.224 0.224 0.224 DIMAP (Litkowski 2001) 0.29 0.29 0.29 0.080 0.080 0.080 IIT 1 (Haynes 2001) 0.22 0.22 0.22 0.188 0.181 0.184 IIT 2 0.23 0.23 0.23 0.197 0.191 0.194 JHU (R) (Yarowsky et al. 2001) 0.64 0.64 0.64 0.267 0.267 0.267 JHU 0.57 0.57 0.57 0.224 0.224 0.224 SMUls (Mihalcea and Moldovan 2001) 0.64 0.64 0.64 0.243 0.243 0.243 KUNLP (Seo et al. 2001) 0.63 0.63 0.63 0.254 0.254 0.254 CS224N (Ilhan et al. 2001) 0.62 0.62 0.62 0.234 0.234 0.234 Sinequa-LIA SCT (Crestan et al. 2001) 0.61 0.61 0.61 0.233 0.233 0.233 TALP (Escudero et al. 2001) 0.59 0.59 0.59 0.515 0.515 0.515 Alicante 0.42 0.41 0.41 0.211 0.208 0.209 IRST (Magnini et al. 2001) 0.67 0.25 0.36 0.0 0.0 0.0 EHU-dlist-all (Agirre and Martinez 2001) 0.57 0.56 0.56 0.225 0.220 0.220 EHU-dlist-best (Agirre and Martinez 2001) 0.83 0.23 0.36 0.193 0.032 0.05 Duluth (Pederson 2001) 0.52 0.52 0.52 0.202 0.202 0.202 Duluth 0.51 0.51 0.51 0.180 0.180 0.180 Duluth 0.55 0.55 0.55 0.205 0.205 0.205 Duluth 0.53 0.53 0.53 0.197 0.197 0.197 Duluth 0.54 0.54 0.54 0.196 0.196 0.196 Duluth 0.57 0.57 0.57 0.210 0.210 0.210 Duluth 0.54 0.54 0.54 0.203 0.203 0.203 Duluth 0.55 0.55 0.55 0.202 0.202 0.202 Baseline Lesk Corpus 0.51 0.51 0.51 0.208 0.208 0.208 Baseline Commonest 0.48 0.48 0.48 0.411 0.411 0.41 Baseline Grouping Lesk Corpus 0.44 0.44 0.44 0.147 0.147 0.147 Baseline Grouping Commonest 0.43 0.43 0.43 0.347 0.347 0.347 Baseline Grouping Lesk 0.27 0.27 0.27 0.147 0.147 0.147 Baseline Grouping Lesk Def 0.23 0.23 0.23 0.145 0.145 0.145 Baseline Lesk 0.23 0.23 0.23 0.181 0.181 0.181 Baseline Grouping Random 0.18 0.18 0.18 0.091 0.091 0.091 Baseline Lesk Def 0.16 0.16 0.16 0.091 0.091 0.091 Baseline Random 0.14 0.14 0.14 0.087 0.087 0.087 Table 3.6: Senseval 2 System Performances CHAPTER 3. TECHNOLOGY FOUNDATIONS 86 the same surrounding context based WSD methods for nouns will not be as effective for verbs, as shown in the results of Senseval-2. A more formal description of this postulation of mine is that in general, there are more nouns occurring in the context of verbs than verbs occurring in the context of nouns. Mathematically, this means that given the two probability distributions: P (noun|verb) and P (verb|noun), the entropy of the former should be greater than the entropy of the latter. An experiment was conducted to verify this postulation. In this experiment, the RASP parser was first used to extract all the verbs and their subjects and direct objects from the British National Corpus. Then, for each verb vi, calculate the entropies of P (n j |vi) and P (n dobj k |vi) (where n subj j is a subject of vi and n dobj k is a direct object of vi), and for each noun ni, calculate the entropies of P (vj |n subj i ) and P (vk|n dobj i ). Finally, the sums of entropies of the same type (P(subj—verb), P(dobj—verb), P(verb—subj) and P(verb—dobj)) are calculated in both a weighted way and an unweighted way, and compared 14. Table ?? shows the sums of the entropies, and it shows that the entropies of P (noun|verb) are consistently higher than the entropies of P (verb|noun), thereby confirming my postulation. Hence, in order to improve the performance of verb sense disambiguation, one must employ deeper semantic analysis than just the surrounding context. One such type of deeper semantic analysis is selectional preferences. 3.9.3 Selectional Preference Based Word Sense Disambiguation In general, selectional preferences describe the phenomenon that nouns which can be used to act as a particular argument of a particular predicating word are in general from the same noun classes. For example, the nouns which can play the eater role for the verb eat (“take in solid food”), would tend belong to the animated thing class, and the nouns which can play the eatee role would tend to belong to the edible class. 15 Furthermore, for the purpose of this thesis, I will extend the general definition of selectional preferences to include nouns which function as adjuncts to predicating words. For example, the verb drive in the sentence I drove a nail into the wall stands for “push, propel, or press with force”, but in the sentence I drove a car into the wall, it stands for “operate or control a vehicle”. The two instances of drive differ only in their patient semantic role.16 The weights are the normalized frequency of the relevant verbs and nouns Unless specified, all semantic roles used in this chapter will use platform-independent neutral terms such as “eater”, “eatee”, “kicker” and “kickee”. In order to simplify the terminology and avoid making premature association with other syntactic/semantic analysis systems, theories or platforms, I will use the term “semantic role” to refer to both the arguments and adjuncts of verbs in the context of selectional preference based VSD, irrespective of whether such case slots are derived from CHAPTER 3. TECHNOLOGY FOUNDATIONS 87 More specifically, a selectional preference is a function mapping semanticroles to noun classes. Selectional preferences between predicating words (verbs and adjectives) and their arguments (nouns) are a type of linguistic information which has previously been combined with statistical methods to perform word sense disambiguation, Resnik (1997) and McCarthy and Carroll (2003). The basic assumption made by all selectional preference based WSD systems is that the different senses of the same predicating word have different selectional preferences. Using the kick example, the sentences exemplifying each sense may generate the following selectional preferences: 1. “He put his foot on the edge of the boat and kicked the boat away from the dock.”

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VirBot: A Virtual Reality Robot Driven With Multimodal Commands

In this paper we show how symbolic Artiicial Intelligence (AI) techniques can be used to develop intelligent Virtual Reality (VR) environments. We have developed an expert system architecture for building intelligent agents that respond to voice and gesture commands in virtual environments. To demonstrate the utility of this we present a simple application that allows a user to drive a virtual ...

متن کامل

The More the Merrier: Multi-Party Negotiation with Virtual Humans

The goal of the Virtual Humans Project at the University of Southern California’s Institute for Creative Technologies is to enrich virtual training environments with virtual humans – autonomous agents that support face-to-face interaction with trainees in a variety of roles – through bringing together many different areas of research including speech recognition, natural language understanding,...

متن کامل

From the Virtual to the RealWorld: Referring to Objects in Real-World Spatial Scenes

Predicting the success of referring expressions (RE) is vital for real-world applications such as navigation systems. Traditionally, research has focused on studying Referring Expression Generation (REG) in virtual, controlled environments. In this paper, we describe a novel study of spatial references from real scenes rather than virtual. First, we investigate how humans describe objects in op...

متن کامل

The instructable virtual agent CoRA

1 Using natural language The traditional way to interact with virtual environments is through direct physical manipulation using data glove, space mouse or some other 3D pointing device. Agents are directed via gestures, objects are manipulated by touching them, metaobjects (menues, buttons etc.) allow the user to select appropiate actions. Interacting with partially autonomous agents, which tr...

متن کامل

Integration of Geometric and Conceptual Reasoning for Interacting with Virtual Environments

This paper describes the knowledge processing in the CODY Virtual Constructor, an operational system enabling the interactive assembly of complex aggregates in a virtual environment. Two forms of reasoners axe used: a geometric reasoner that infers spatial properties of scene objects and a conceptual reasoner that keeps track of the evolving aggregate’s assembly structure. The combination of th...

متن کامل

Grounding language in events

Broadcast video and virtual environments are just two of the growing number of domains in which language is embedded in multiple modalities of rich non-linguistic information. Applications for such multimodal domains are often based on traditional natural language processing techniques that ignore the connection between words and the non-linguistic context in which they are used. This thesis de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009